North Korean Neural Machine Translation through South Korean Resources
نویسندگان
چکیده
South and North Korea both use the Korean language. However, natural language processing (NLP) research has mostly focused on Therefore, existing NLP systems in language, such as neural machine translation (NMT) systems, cannot properly process inputs. Training a model using data is most straightforward approach to solving this problem, but train NMT models are insufficient. To solve we constructed parallel corpus develop comparable corpus. We manually aligned sentences create evaluation automatically remaining training data. trained our improved quality resources pre-trained model. In addition, propose Korean-specific pre-processing methods, character tokenization, phoneme decomposition more efficiently. demonstrate that consistently improves accuracy compared other methods.
منابع مشابه
Korean Adverb Ordering in English-Korean Machine Translation Using Clustering
This paper proposes an approach to determine the ordering of Korean adverb by using clustering method for making sentences more natural at the generation stage of English-Korean machine translation system. After observing the feature information of Korean adverb classified by scholars of Korean literature, we analyze an adverb ordering about the feature information. Afterwards, we extract conse...
متن کاملCustomizing an English-Korean Machine Translation System for Patent Translation
This paper addresses a method for customizing an English-to-Korean machine translation system from general domain to patent domain. The customizing method consists of following steps: 1) linguistically studying about characteristics of patent documents, 2) extracting unknown words from large patent documents and constructing large bilingual terminology, 3) extracting and constructing the patent...
متن کاملCharacteristics of Body Composition and Muscle Strength of North Korean Refugees during South Korean Stay
BACKGROUND The aim of this study was to investigate the changes of body composition and muscle strength of North Korean refugees (NKRs) according to their duration of stay in South Korea. METHODS NKRs who volunteered and were living in South Korea, aged 20 to 75 years were recruited. Body compositions were analyzed by bioelectrical impedance analysis. Muscle strength was measured with the han...
متن کاملContrastive Analysis and Feature Selection for Korean Modal Expression in Chinese-korean Machine Translation System
JIN-JI LI, JI-EUN ROH, DONG-IL KIM AND JONG-HYEOK LEE Department of Computer Science and Engineering, Electrical and Computer Engineering Division and Advanced Information Technology Research Center (AITrc) Pohang University of Science and Technology (POSTECH) San 31 Hyoja Dong, Pohang, 790-784, Korea Language Engineering Institute, Department of Computer, Electron and Telecommunication Enginee...
متن کاملKorean NLP2RDF Resources
The aim of Linked Open Data (LOD) is to improve information management and integration by enhancing accessibility to the existing various forms of open data. The goal of this paper is to make Korean resources linkable entities. By using NLP tools, which are suggested in this paper, Korean texts are converted to RDF resources and they can be connected with other RDF triples. It is worth noticing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing
سال: 2023
ISSN: ['2375-4699', '2375-4702']
DOI: https://doi.org/10.1145/3608947